Overview

This notebook briefly explores whether there are any detectable differences between the colonies from the public inventory and those from the structured sampling. It is by no means clear that I actually have the data to do this, particularly given the changes along the elevational gradient in some traits. Nevertheless, there was agreement at lab meeting that it should be investigated.

By imaging order

wkr.df %>% filter(Trait %in% focus_traits) %>%
  ggplot(aes(x=img_order, y=Value_std)) + 
  geom_point(shape=1, aes(colour=source)) + 
  stat_smooth(method="loess", colour="black", size=0.5) +
  stat_smooth(method="loess", size=0.5, aes(colour=source)) +
  facet_wrap(~Trait) + scale_colour_brewer(type="qual", palette=2)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## Warning: Removed 117 rows containing missing values (geom_point).

clny.df %>% filter(Trait %in% focus_traits) %>%
  ggplot(aes(x=img_order, y=mnValue_std)) + 
  geom_point(shape=1, aes(colour=source)) + 
  stat_smooth(method="loess", colour="black", size=0.5) +
  stat_smooth(method="loess", size=0.5, aes(colour=source)) +
  facet_wrap(~Trait) + scale_colour_brewer(type="qual", palette=2)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 54 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 54 rows containing non-finite values (stat_smooth).
## Warning: Removed 54 rows containing missing values (geom_point).

Boxplots

wkr.df %>% filter(Trait %in% focus_traits) %>%
  ggplot(aes(x=Value_std, y=Trait, fill=source)) + 
  geom_boxplot()

clny.df %>% filter(Trait %in% focus_traits) %>%
  ggplot(aes(x=mnValue_std, y=Trait, fill=source)) + 
  geom_boxplot()

clny.df %>% filter(Trait %in% focus_traits) %>%
  ggplot(aes(x=log(sdValue_std), y=Trait, fill=source)) + 
  geom_boxplot()
## Warning: Removed 315 rows containing non-finite values (stat_boxplot).

Across elevation

Colour

Let’s look at colour first:

ggplot(filter(wkr.df, Trait_orig=="v"), aes(mnt25, Value, colour=source)) +
  geom_point(shape=1) + stat_smooth(method="lm", se=F) + 
  facet_wrap(~SPECIESID, scales="free_y")
## `geom_smooth()` using formula 'y ~ x'

wkr.df %>% filter(Trait_orig == "v") %>%
  ggplot(aes(x=img_order, y=Value_std)) + 
  geom_point(shape=1, aes(colour=source)) + 
  stat_smooth(method="lm", colour="black", size=0.5) +
  stat_smooth(method="lm", size=0.5, aes(colour=source)) +
  facet_wrap(~SPECIESID, scales="free_x") + 
  scale_colour_brewer(type="qual", palette=2)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (geom_point).

There does seem to be a difference between public and scientific samples for color, at least on average across most species when we had samples for both. It seems that the public samples tend to be darker. It could be that there was a tendency for them to collect darker individuals, given a set of workers, or it could be that since I was pulling workers from underground, it included more light-colored individuals. If color darkens somewhat with age or exposure to sunlight, then it could be that collecting from the surface selects a darker subset of the total workers. If there is any methodological artefact, it is confounded with the imaging order. What I really need to do is get the grey value of the background across images. I could try to automate it somehow, or do it for a subset. I can’t do it for 1706 images… Even with 30 seconds per image, that’s 14 hours.

The median grey value shows essentially the same thing:

ggplot(filter(wkr.df, Trait_orig=="grey_md"), aes(mnt25, Value, colour=source)) +
  geom_point(shape=1) + stat_smooth(method="lm", se=F) + 
  facet_wrap(~SPECIESID, scales="free_y")
## `geom_smooth()` using formula 'y ~ x'

Webers Length

Now Webers Length:

ggplot(filter(wkr.df, Trait_orig=="WebersLength"), 
       aes(mnt25, Value, colour=source)) +
  geom_point(shape=1) + stat_smooth(method="lm", se=F) + 
  facet_wrap(~SPECIESID, scales="free_y")
## `geom_smooth()` using formula 'y ~ x'

Still some possible differences, but this time the public workers tend to be a little bit bigger. But it isn’t quite as dramatic.

Leg Length

Now Webers Length:

ggplot(filter(wkr.df, Trait_orig=="HindLen"), 
       aes(mnt25, Value, colour=source)) +
  geom_point(shape=1) + stat_smooth(method="lm", se=F) + 
  facet_wrap(~SPECIESID, scales="free_y")
## `geom_smooth()` using formula 'y ~ x'

Structured samples often have longer legs…